Project 4

Project Description Part One (Image Warping and Mosaicing)

The first part of this project is to compute image mosaicing by manually labeling correspondence points for calculating the warping matrix. Unlike the affine matrix transformation used in face morphing which has 6 degrees of freedom in project 3, the projective warping matrix in image mosaicing has 8 degrees of freedom.

In part one, part 1.2 computes the homography warping matrix, parts 1.3 and 1.4 performs the actual image transformation through image rectification, and part 1.5 blends two images into a mosaic by combining all the pieces.

1.1 Shoot the Pictures

All pictures are taken with my iPhone XR. I downsized all of them from (4000, 3000) to (500, 375) in order to reduce computation cost. The gym cube and gym rules images are used for parts 1.3 and 1.4 as test cases before finally combining two images into a mosaic in part 1.5.


Gym Cube

Gym Rules Board


1.2 Recover Homographies

In order to compute the project warping matrix, I first need to manually label correspondence points. The minimum number of correspondence points is 4, due to the number of degrees of freedom in the transformation. I then set up a linear system of equations in the form of Ah = b, where h is a vector holding the 8 variables that defines the projective warping.

It is recommended to have more than 4 correspondence points, because doing so leads to the homography being more stable and less prone to noise. In that case, I used least squares to solve for h.



1.3 Warp the Images

This part is very similar to the face morphing color interpolation in project 3. In fact, I copied and pasted my implementation over, then I made modifications to it. The main change is that projective transformation often leads to the transformed image being in a different shape than the original image. Therefore, prior to color interpolation using the inverse warping matrix, I first used the homography to map the four corners of the original input image to the output canvas. In addition, since not all pixels in the output image have their corresponding pixels in the original input image, I needed to generate a mask to find which those pixels are.

As shown in the gym rules example below, the viewing perspective of the gym rules board is now completely changed from the left side to the front of it. The shape of the output image is completely different from the shape of the input image. The black areas in the output image are pixels that do not have matches in the input image.



1.4 Image Rectification

As shown in the gym cube example below, not only can project transformation warp change the viewing perspective of an object (gym rules example above), it also has zoom in effects together with changing the viewing perspective if you label the correct correspondence points.



1.5 Blend the Images into a Mosaic

Before blending the images into a mosaic, I first had to be careful when taking pictures. The requirement is that the two images have to be taken with the same center of projection (COP).

In all my three examples below, I warped the left image to the projection of the right image. In other words, the right image is unwarped. The correspondence points I usually label are slightly more than 10. In order to align the warped image on the left to the unwarped image on the right, I used the align_image_code.py script provided in project 2.

If I were to blend the two images in a naive approach through simple addition, as we can see, the overlapped region is overly exposed. The seam around the border of the overlapping region is also super obvious. To solve the problem, I did two things. First, I used my multiresolution blending implementation from project 2, where I generated a Laplacian pyramid. In the mosaic use cases, a 2-level pyramid gives good performance. Second, I made modifications to the mask in my previous Laplacian pyramid. Previously, my simple mask was a horizontal or a vertical one dividing the canvas into half and half. In this project, I was able to make the mask adapt to each image mosaic shape. I did this by 1) create a binary mask for each image 2) identify the overlapping region 3) for pixels that are not in the overlapping region but has color values, set the mask pixel to 1, otherwise 0 is assigned 4) iterate over each pixel in the overlapping region, use distance transform to find out each pixel is closer to image 1 or image 2, assign 1 to the mask where the pixel is closer to and 0 to the mask where the pixel is farther from. 5) lastly, similar to my implementation in project 2, apply gaussian filter to both masks at each level of the pyramid.





Project Description Part Two (Feature Matching for Autostitching)

Rather than manually labeling the correspondence points for warping one image to the projection of the other, this second part of the project is about automatically labeling correspondence points. The algorithm is simplified but grounded in the following paper: Multi-Image Matching using Multi-Scale Oriented Patches.

Part 2.1 detects corner features using the Harris method. The adaptive non-maximal suppression is applied afterwards to refine the detected corners. Part 2.2 extracts a feature descriptor for each corner point. For each corner point, I first selected a 40 * 40 window, then resized it to a 8 * 8 window. Part 2.3 matches the feature descriptors between two images by computing the ratio of the first and second nearest neighbor based on the sum of squared distance metric. Part 2.4 then refines the matches using RANSAC to remove outliers. Part 2.5 finally uses the automatically labeled correspondence points to blend two images into a mosaic, similar to part 1.5.

2.1 Detecting Corner Features in an Image

In this part, the Harris corner detection algorithm is provided. The outputs are shown below.

Due to the large number of corners detected, I used the adaptive non-maximal suppression method to sample, so that I am able to both reduce the number of corners detected but also make sure that the corners are evenly distributed in space. In the kitchen example, I set the parameter to reduce the number of corners from roughly 4000 to 500.


kitchen left Harris corners (4000)

kitchen right Harris corners (4000)

kitchen left ANMS (500)

kitchen right ANMS (500)


2.2 Extracting a Feature Descriptor for Each Feature Point

This part extracts a feature descriptor for each corner point from part 2.1. The feature descriptor is originally a 40 * 40 window centered at each corner point, but I resized it to a 8 * 8 patch with gaussian blurring to protect against aliasing. Every patch is normalized.

2.3 Matching Feature Descriptors Between Two Images

Using feature descriptors from part 2.2, I matched corners between two images by computing the ratio of the first nearest neighbor with the second nearest neighbor based on the sum of squared distance metric on the feature descriptors. It is the eNN1 / eNN2 algorithm as proposed in the paper. Below shows the matched corner points in the kitchen example, and a pair of matching feature descriptors. As we can easily see, the matched corner points are not perfect, which still contains some outliers.


kitchen left
feature descriptor match

kitchen right
feature descriptor match

kitchen left
feature descriptor example

kitchen right
feature descriptor example


2.4 Use RANSAC to Compute a Homography

To remove the outliers mentioned in part 2.3, I used the random sampling consensus (RANSAC) algorithm. The algorithm works by randomly selecting four corner matches and compute a homography transformation matrix based on them. Assuming these four matches are the correct ones, we then use the same homography to transform all of the corner points from image 1 to image 2. We then calculate the sum of squared distance between the transformed image 1 corner points and matched image 2 corner points. We count the number of correct matches by setting a threshold, where I used 1 pixel. We iterate this process for a large number of times and save the homography with the largest number of correct matches. As shown below, the kitchen example bow contains only the correct corner matches.


kitchen left final correspondence points

kitchen right final correspondence points


2.5 Proceed as in Part One to Produce a Mosaic

There is nothing special in this part. I blended two images based on the correspondence points found. Please check out the detailed steps in part 1.5.

By comparing the results between manually labeled correspondences in part 1 and automatically labeled correspondences in part 2, there are actually some differences. Generally, the manually labeled results are better, as shown in the kitchen and pool examples. In both cases the kitchen and pool outputs from manually labeled correspondences are almost perfect, whereas the kitchen draws stick out and the poolside is discontinued in the automatically labeled outputs. The two methods produce similar results in the parking example. Demonstrated with the kitchen example, I have a few words of explanation for their difference.

If we take a closer look at the correspondence points used to compute the homography matrix between the two methods, we see an obvious difference. The manually labeled correspondence points are much more spread out in the kitchen example, especially in the horizontal space compared to the automatically labeled ones. I believe that I am able to manually add two corner points to the automatically labeled set, one on the top right corner of the fridge and one in the middle of the two towels hanging on the oven, the result can be much better. Same explanation also applies to the pool example. Although the automatic method produces only 6 matching correspondences, the pool example actually has 20. I do want to point this out because the issue is not the number of correspondence points found, but rather how these correspondences are spread out in space. The other solution to this problem is to have the two images overlap more with each other while shooting them.


kitchen manually labeled blended

kitchen automatically labeled blended


pool manually labeled blended

pool automatically labeled blended


parking manually labeled blended

parking automatically labeled blended


What Have I Learned?

In part one, I was introduced to image mosaicing, which is fascinating. I did not expect that the trick behind google map street view is so simple. In implementation, I had a difficult time with the mask in the multiresolution pyramid. I tried many things, including doing a weighted average of the overlapping region horizontally. However, that method does not remove any of the vertical seams, which lead me to the bwdist method. The outputs are cool!

In part two, the implementations were pretty smooth for me. I just had a difficult time debugging why the final output is not as good as the manually labeled ones, but I ended up finding nothing. I believe that my explanation makes sense.